{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a13ea924",
   "metadata": {},
   "source": [
    "# Tagging\n",
    "\n",
    "The tagging chain uses the OpenAI `functions` parameter to specify a schema to tag a document with. This helps us make sure that the model outputs exactly tags that we want, with their appropriate types.\n",
    "\n",
    "The tagging chain is to be used when we want to tag a passage with a specific attribute (i.e. what is the sentiment of this message?)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "bafb496a",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.6.4) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "from langchain.chat_models import ChatOpenAI\n",
    "from langchain.chains import create_tagging_chain, create_tagging_chain_pydantic\n",
    "from langchain.prompts import ChatPromptTemplate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "39f3ce3e",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "832ddcd9",
   "metadata": {},
   "source": [
    "## Simplest approach, only specifying type"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fc8d766",
   "metadata": {},
   "source": [
    "We can start by specifying a few properties with their expected type in our schema"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "8329f943",
   "metadata": {},
   "outputs": [],
   "source": [
    "schema = {\n",
    "    \"properties\": {\n",
    "        \"sentiment\": {\"type\": \"string\"},\n",
    "        \"aggressiveness\": {\"type\": \"integer\"},\n",
    "        \"language\": {\"type\": \"string\"},\n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "6146ae70",
   "metadata": {},
   "outputs": [],
   "source": [
    "chain = create_tagging_chain(schema, llm)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e306ca3",
   "metadata": {},
   "source": [
    "As we can see in the examples, it correctly interprets what we want but the results vary so that we get, for example, sentiments in different languages ('positive', 'enojado' etc.).\n",
    "\n",
    "We will see how to control these results in the next section."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "5509b6a6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'sentiment': 'positive', 'language': 'Spanish'}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "inp = \"Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!\"\n",
    "chain.run(inp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "9154474c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'sentiment': 'enojado', 'aggressiveness': 1, 'language': 'Spanish'}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "inp = \"Estoy muy enojado con vos! Te voy a dar tu merecido!\"\n",
    "chain.run(inp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "aae85b27",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'sentiment': 'positive', 'aggressiveness': 0, 'language': 'English'}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "inp = \"Weather is ok here, I can go outside without much more than a coat\"\n",
    "chain.run(inp)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bebb2f83",
   "metadata": {},
   "source": [
    "## More control\n",
    "\n",
    "By being smart about how we define our schema we can have more control over the model's output. Specifically we can define:\n",
    "\n",
    "- possible values for each property\n",
    "- description to make sure that the model understands the property\n",
    "- required properties to be returned"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69ef0b9a",
   "metadata": {},
   "source": [
    "Following is an example of how we can use _enum_, _description_ and _required_ to control for each of the previously mentioned aspects:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "6a5f7961",
   "metadata": {},
   "outputs": [],
   "source": [
    "schema = {\n",
    "    \"properties\": {\n",
    "        \"sentiment\": {\"type\": \"string\", \"enum\": [\"happy\", \"neutral\", \"sad\"]},\n",
    "        \"aggressiveness\": {\n",
    "            \"type\": \"integer\",\n",
    "            \"enum\": [1, 2, 3, 4, 5],\n",
    "            \"description\": \"describes how aggressive the statement is, the higher the number the more aggressive\",\n",
    "        },\n",
    "        \"language\": {\n",
    "            \"type\": \"string\",\n",
    "            \"enum\": [\"spanish\", \"english\", \"french\", \"german\", \"italian\"],\n",
    "        },\n",
    "    },\n",
    "    \"required\": [\"language\", \"sentiment\", \"aggressiveness\"],\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "e5a5881f",
   "metadata": {},
   "outputs": [],
   "source": [
    "chain = create_tagging_chain(schema, llm)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ded2332",
   "metadata": {},
   "source": [
    "Now the answers are much better!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "d9b9d53d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'sentiment': 'happy', 'aggressiveness': 0, 'language': 'spanish'}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "inp = \"Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!\"\n",
    "chain.run(inp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "1c12fa00",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'sentiment': 'sad', 'aggressiveness': 10, 'language': 'spanish'}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "inp = \"Estoy muy enojado con vos! Te voy a dar tu merecido!\"\n",
    "chain.run(inp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "0bdfcb05",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'sentiment': 'neutral', 'aggressiveness': 0, 'language': 'english'}"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "inp = \"Weather is ok here, I can go outside without much more than a coat\"\n",
    "chain.run(inp)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e68ad17e",
   "metadata": {},
   "source": [
    "## Specifying schema with Pydantic"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f5970ec",
   "metadata": {},
   "source": [
    "We can also use a Pydantic schema to specify the required properties and types. We can also send other arguments, such as 'enum' or 'description' as can be seen in the example below.\n",
    "\n",
    "By using the `create_tagging_chain_pydantic` function, we can send a Pydantic schema as input and the output will be an instantiated object that respects our desired schema. \n",
    "\n",
    "In this way, we can specify our schema in the same manner that we would a new class or function in Python - with purely Pythonic types."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "bf1f367e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from enum import Enum\n",
    "from pydantic import BaseModel, Field"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "83a2e826",
   "metadata": {},
   "outputs": [],
   "source": [
    "class Tags(BaseModel):\n",
    "    sentiment: str = Field(..., enum=[\"happy\", \"neutral\", \"sad\"])\n",
    "    aggressiveness: int = Field(\n",
    "        ...,\n",
    "        description=\"describes how aggressive the statement is, the higher the number the more aggressive\",\n",
    "        enum=[1, 2, 3, 4, 5],\n",
    "    )\n",
    "    language: str = Field(\n",
    "        ..., enum=[\"spanish\", \"english\", \"french\", \"german\", \"italian\"]\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "6e404892",
   "metadata": {},
   "outputs": [],
   "source": [
    "chain = create_tagging_chain_pydantic(Tags, llm)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "b5fc43c4",
   "metadata": {},
   "outputs": [],
   "source": [
    "inp = \"Estoy muy enojado con vos! Te voy a dar tu merecido!\"\n",
    "res = chain.run(inp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "5074bcc3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Tags(sentiment='sad', aggressiveness=10, language='spanish')"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "res"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
